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Abstract 

Recently Raghavendra and Tan (SODA 2012) gave a 0.85-approximation algorithm for the 
Max Bisection problem. We improve their algorithm to a 0.8776-approximation. As Max 
Bisection is hard to approximate within acw+e ~ 0.8786 under the Unique Games Conjecture 
(UGC), our algorithm is nearly optimal. We conjecture that Max Bisection is approximablc 
within a GW — e, i.e., that the bisection constraint (essentially) does not make Max Cut harder. 

We also obtain an optimal algorithm (assuming the UGC) for the analogous variant of Max 
2-Sat. Our approximation ratio for this problem exactly matches the optimal approximation 
ratio for Max 2-Sat, i.e., olllz + e ~ 0.9401, showing that the bisection constraint does not 
make Max 2-Sat harder. This improves on a 0.93-approximation for this problem due to 
Raghavendra and Tan. 
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1 Introduction 



In the Max Bisection problem we are given a (weighted) graph G = (V,E), and the objective is 
to find a bisection V = S U S, \S\ = \S\ = \V\/2 such that the number (weight) of edges between 
S and S is maximized. 

Max Bisection is closely related to the Max Cut problem, in which the constraint \S\ = \S\ 
is dropped. Max Cut is one of Karp's original 21 NP-Complete problems [Kar72] and is one of the 
most well-studied NP-hard problems. In a seminal work Goemans and Williamson [GW95] show 
how to use Semidefinite Programming to obtain an acw ~ 0.8786 approximation algorithm for 
Max Cut. Here we say that a (randomized) algorithm is an a-approximation if for every graph 
G it outputs a cut in which the number of edges cut is (in expectation) at least an a fraction 
of the optimum number of edges cut. Since then, a series of results have continued the study of 
the approximability of Max Cut, by providing improved approximation ratios in special classes 
of graphs [AKK99, FKL02], integrality gaps for (strengthenings of) the Semidefinite Programming 
relaxation [FS01, KV05, KS09], and hardness of approximation results [HasOl]. In a celebrated 
result, Khot et al. [KKMO07] proved that, assuming the Unique Games Conjecture, it is hard to 
approximate Max Cut within a factor acw + e for anv e > 0. Subsequently, O'Donnell and Wu 
[OW08] determined the entire "approximability curve" of Max Cut, thereby completely settling 
the approximability of Max Cut modulo the Unique Games Conjecture. 

Overall, one can think of Max Cut as a problem whose approximability has been (essentially) 
resolved. It is worthwhile to note that this mostly stems from the local nature of the problem, 
i.e., that one can analyze the value of the objective function by analyzing whether each edge is cut 
separately. In other words both feasibility and the objective value of a potential solution to Max 
Cut are very local. 

Max Bisection on the other hand has a global condition | *S' | = |S| determining feasibility. It 
is perhaps not surprising then that settling the approximability of Max Bisection has turned out 
to be more challenging. While it is well-known and easy to see that Max Bisection is at least 
as hard to approximate as Max Cut (the reduction from Max Cut to Max Bisection simply 
outputs two disjoint copies of the graph), it is not known whether the converse holds, i.e., 

Is Max Bisection as easy to approximate as Max Cut? 

There has been a long chain of results obtaining improved approximation algorithms for Max 
Bisection. Frieze and Jerrum [FJ97], in the first nontrivial approximation algorithm, showed that 
the problem can be approximated to within a factor of 0.6514. Subsequently, Ye [YeOl], Halperin 
and Zwick [HZ02], and Feige and Langberg [FL06] gave algorithms for Max Bisection with ratios 
0.699, 0.7016, and 0.7028 respectively. For the case of regular graphs, Feige et al. [FKL01] showed 
that one can improve the approximation ratio to 0.795 (or even 0.834 for 3-regular graphs). Very 
recently, in a significant improvement, Raghavendra and Tan [RT12] gave a 0.85-approximation 
algorithm (based on a computer-assisted analysis), improving upon these previous results. 

1.1 Our Contributions 

Our main contribution is a further improvement on the approximability of Max Bisection. We 
present a new approximation algorithm for Max Bisection with approximation factor a, where a 
is the minimum of a certain function over a simple 3-dimensional polytope. Using a Matlab program 
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we non-rigorously estimate that a « 0.87765366, and using a computer-assisted case analysis we 
can formally prove this up to four digits of accuracy. 

Theorem 1.1. Max Bisection is approximable in polynomial time to within a factor 0.8776. 

As mentioned above, Max Bisection is as hard as Max Cut, and hence the UGC implies that 
Max Bisection cannot be approximated to within a factor «gw + e ~ 0.8786 for any e > 0, so 
our approximation ratio is off from the optimal by less than 10 -3 . As it turns out, our algorithm 
has a lot of flexibility, indicating that further improvements may be possible. We remark that, 
while polynomial, the running time of the algorithm is somewhat abysmal; loose estimates places 
it somewhere around O^n lol °°^; the running time of the algorithm of [RT12] is similar. 

One can consider bisection-like variants of any Max CSP. We refer to the resulting problem as 
Max Bisect-CSP. For instance, in the Max Bisect-2-Sat problem, we are given a Max 2-Sat 
instance and the goal is to obtain an assignment to the variables maximizing the number of satisfied 
clauses, subject to the constraint that exactly half of the variables are set to true, and the other 
half are set to false. For Max Bisect-2-Sat, [RT12] gave a 0.93-approximation algorithm (again 
based on a computer-assisted analysis). Under the Unique Games Conjecture, the approximation 
threshold for Max 2-Sat is known to be oillz ~ 0.9401 [LLZ02, Aus07] and again it is easy to 
prove that Max Bisect-2-Sat can not be easier than this (see Section 6). We show that a simple 
modification to the algorithm of [RT12] yields the optimal approximation ratio cxllz for Max 
Bisect-2-Sat. 

Theorem 1.2. For every e > 0, Max Bisect-2-Sat can be approximated to within cyllz — e in 
time n poly ( 1//e ). Here o-llz ~ 0.9401 is the approximation threshold for Max 2-Sat. 

This may seem surprising at first, but boils down to what seems to be a lucky coincidence: the 
rounding scheme of [RT12] for the semidefinite program uses a certain variant of random hyperplane 
rounding. We generalize this to a certain family of random hyperplane-based roundings, and it turns 
out that the optimal rounding scheme for Max 2-Sat already comes from this family. 

Given these results, we think it is likely that Max Bisection is essentially as easy to approxi- 
mate as Max Cut, and make the following conjecture. 

Conjecture 1.3. For every e > 0, Max Bisection is approximable in polynomial time within a 
factor acw — £■ 

1.2 Techniques and Comparison to Previous Work 

All approximation algorithms for Max Bisection to date use a semidefinite programming relax- 
ation similar to the Goemans- Williamson algorithm for Max Cut. In its standard form, each 
vertex i of the graph is associated with a high-dimensional unit vector Vj simulating the integral 
values ±1, and the goal is to choose these vectors in such a way that pairs of vertices connected 
by edges are as far apart as possible. To be more concrete the goal is to maximize the "objective 
value" of the vectors defined as Y^ijeE^ ~ ( v *> v j))/2- There is also an additional balance constraint 
encoding that the vectors somehow correspond to a bisection as opposed to an arbitrary cut (this 
balance constraint is not important for the high-level discussion of this section). An (essentially) 
optimal set of such vectors can be found in polynomial time, and the next step is to "round" these 
vectors to a bisection of the vertices. 
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The vast majority of SDP-based approximation algorithms use a variant of random hyperplane 
rounding, pioneered by Goemans and Williamson [GW95]. For Max Cut, this works as follows: a 
random hyperplane passing through the origin is chosen. This hyperplane naturally induces a cut 
of the graph: each side of the cut is defined by the vertices whose vector lies on one side of the 
hyperplane. Analyzing the resulting cut boils down to a simple local argument: one can show that 
each edge of the graph goes across the cut with probability at least «gw times its contribution to 
the objective value of the vectors. 

It is helpful to see why the same rounding does not work for Max Bisection, i.e., why the 
resulting partition is not necessarily a bisection. Although each vertex has probability 1/2 of landing 
on each side of the cut, these probabilistic events (for different vertices) are not independent. In 
fact for some vector solutions they are highly correlated. In other words although the expected size 
of each side of the cut is |V|/2, the cut may in general be very unbalanced with high probability. 

Most of the previous algorithms have coped with this by coming up with more sophisticated 
variants of the random hyperplane rounding that do produce a partition that is (close to) a bisection. 
On the other hand, the most recent work [RT12] took a somewhat different approach. They use a 
family of stronger SDP relaxations derived by the so-called Lasserre lift-and-project system, whose 
vector solutions enjoy nice structural properties and which can be rounded to yield an improved 
approximation ratio. As this is not the main contribution of our work, we only briefly comment on 
the Lasserre lift-and-project system and how it derives the SDP that we utilize, in Section 2.1. 

The key idea of [RT12] is that using an operation known as conditioning, the Lasserre lift-and- 
project system allows us to obtain solutions to the standard SDP in which a typical pair of vertices 
has very low correlation. Therefore, it essentially follows by Chebyshev's inequality that the size 
of each side of the partition produced by hyperplane rounding will be concentrated around |V|/2. 
Once such a nearly-balanced partition is found it can be adjusted to a bisection for a small additive 
loss in the number of edges cut. 

There is, however, a major caveat hiding in the word "correlation" in the paragraph above. 
There are many possible ways of defining what it means for the vectors to have "low correlation" , 
and the precise notion used in the algorithm of [RT12] results in rather severe constraints on 
the rounding algorithm that can be applied to the vectors. In particular plain vanilla random 
hyperplane rounding still does not produce a cut that is close to a bisection; if it did, we would 
already have an (acw — e)-algorithm! 

In their 0.85-algorithm, [RT12] used thresholded random hyperplane rounding in the space 
orthogonal to Vo- In this rounding, each vertex i has a threshold tj which adjusts the probability 
that vertex i falls on a given side of the cut (by shifting the hyperplane by t, along its normal when 
looking at which side of the hyperplane Vj lies). How one chooses these thresholds ti is the key 
to both the balance and the objective value of the resulting cut. Using a certain natural choice of 
thresholds, [RT12] show that the resulting cut is near-balanced while at the same time providing a 
good approximation ratio. The main issue that restricts their method is that their proof that the 
resulting cut is near-balanced is only applicable to their particular choice of the thresholds. 

The source of our improved approximation ratio is as follows. First, we use a stronger notion of 
what it means for an SDP solution to have "low correlation", and show that after minor modifica- 
tions the techniques of [RT12] can be extended to produce SDP solutions that have low correlation 
under this stronger definition. Then, the advantage of this modification is that it buys us a lot 
of freedom to choose the thresholds for the random hyperplane rounding (though plain random 
hyperplane rounding is still not possible). This lets us propose a rich family of algorithms all of 
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which would result in an near-balanced cut. 

As it turns out, the family of roundings algorithms is still quite restrictive. While a simple 
modification to the choice of thresholds from [RT12] gives an improved ratio of 0.8736, improving 
this to 0.8776 is more challenging. As opposed to all previous similar rounding algorithms that 
we are aware of, our procedure for choosing thresholds has a combinatorial flavor. This results in 
an interesting side effect that we think is worth mentioning: two vertices i and j whose vectors 
and V,- are equal, may be treated completely differently by the rounding algorithm, i.e., they may 
have completely different probabilities of landing on each side of the cut. We are not aware of any 
previous rounding algorithms where this occurs. 

The extra flexibility that comes from this combinatorial component makes the approximation 
ratio harder to analyze. In previous algorithms, the probability that an edge ij is cut only depends 
on the pairwise inner products between the three vectors Vo, Vj , Vj . Thus computing the approx- 
imation ratio boils down to minimizing a certain function in three variables. In our algorithm 
however, the rounding thresholds ti and tj of the vertices i and j - and hence the probability that 
the edge is cut - are not determined by these three vectors. 

However, we are able to analytically remove this uncertainty and reduce the problem of com- 
puting the approximation ratio to again minimizing a certain function in the three inner products. 
Unfortunately it is not possible to compute this minimum analytically, and we resort to a computer 
assisted proof. In particular, using a computer program we can break the space of all possible values 
for the inner products of Vi , Vj , Vk into small cubes and then lowerbound the approximation ratio of 
the algorithm for each such cube. The approach is in the same spirit as those in [Zwi02, Sj609] and 
produces a rigorous (albeit very large) proof of Theorem 1.1. The details of the computer assisted 
proof are presented in Section 7. 

Our results for Max Bisect-2-Sat are easier: the best algorithm for Max 2-Sat is already 
based on a thresholded random hyperplane rounding and, luckily for us, chooses thresholds in 
such a way that the resulting assignment is expected to be close to balanced. In other words the 
optimal rounding for Max 2- Sat is in our family of rounding algorithms and can be used for Max 
Bisect-2-Sat. 

1.3 Organization 

The rest of the paper is organized as follows. Section 2 contains some preliminaries and sets up 
some notation. In Section 3 we describe a fairly general family of Max Bisection algorithms. 
In fact the algorithm of [RT12] is the simplest possible algorithm in our family. We then present 
a relatively simple improvement over [RT12] in Section 4. Then we give our best algorithm in 
Section 5, resulting in our final bound of 0.8776. In Section 6 we note that the algorithm of 
Section 3 can be applied to Max Bisect-CSP(P) problems in general, and in particular to Max 
Bisect-2-Sat for which we immediately obtain Theorem 1.2. We elaborate further on the details 
of our computer generated proof in Section 7. We conclude with some remarks in Section 8. 

2 Preliminaries 

For notational convenience we work with unweighted graphs throughout the paper, but we note 
that our algorithm and its analysis applies verbatim to the weighted case as well. Given a graph 
G = (V, E) the Max Bisection problem can be formulated as an integer program as follows. To 
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each vertex i G V we associate a variable Xj G {—1, 1}, with the two values representing the two 
different pieces of the bisection. The 0-1 indicator of whether an edge ij G E is cut can then be 
written as 1 X ^ X] ■ We define 

Val(x) = ^(l-x^)G[0,l] 

ijeE 

to be the number of edges cut by a partition x G { — 1, l} n . The Max Bisection problem is then 

max Val(x) 



s.t. ^Xi = j | | 

iev 

Xi€{-l,l} VieV. 

We denote by Opt(G) the optimum of the above program, i.e., the number of edges cut by the 
optimal bisection. 

2.1 Semidefinite Relaxation and the Lasserre System 

By replacing the Xj's with high dimensional unit vectors Vj and their products by the corresponding 
inner products, we obtain the basic SDP relaxation for Max Bisection. For a set of unit vectors 
Vi, . . . , v n , we write 

SDPVal({v,}) = ^(l-(v i ,v j )) 

ijeE 

for the objective function of the vectors. The basic SDP relaxation is then 
max SDPVal({vi}) 



s.t. 



2 

iev 



= <E v *'E v *> = 

2 iev iev 



<Vi,Vi) = l Vi G V. 

To strengthen the standard SDP for Max Bisection one can add variables v$ for any small set 
S C V (\S\ < €). This variable will simulate ILeS 27 *' ^- e -' ^ ne P ar ity °f t ne number of vertices 
i G S on one side of the cut. If one adds a few intuitive consistency requirements on these variables 
one gets an SDP relaxation which is equivalent to the so-called levels Lasserre strengthening of the 
standard SDP. 

max SDPVal({vJ) 

s.t. (v ,J> SAW )=O VS<ZV,\S\<£ 
iev 

(v 5l , v 52 ) = (V5 3 , v S4 ) VSi, . . . ,5 4 C V, . . , |5 4 | < S*i A S*2 = 5 3 A 5 4 

(V0,V ) = 1 

We write SDP^(G) for the optimum of this semidefinite program. It is not hard to check that this 
this is valid relaxation for Max Bisection, i.e., for all £, SDP^(G) > Opt(G). 
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The parameter I for us will be a fixed constant that we will choose later. Note that the above 
program can be solved in time n°^' G poly(n) using semidefinite programming. We will use as 
a shorthand for vn\ and vo as a shorthand for V0. 

We note that the above program enjoys many nice properties including a probabilistic interpre- 
tation involving the so-called "local distributions" , however as these are not the main focus of the 
current work we refer the interested reader to [Las02] and [CT11]. We do use the follwing property 
of the program however. The vectors satisfy the so-called triangle inequalities. In particular, if 
I > 2 for any three vectors uo = ±Vo, ui, U2 G {±vi, . . . , ±v n } the following inequality holds: 

||ui - U2II2 < ||ui - U0II2 + || uo - U2II2 ■ (2) 

When analyzing the algorithm, the relevant quantities turn out to be the pairwise inner products 
( v «; v o)) ( v ji v o)i and (vj,Vj). For this reason, we introduce shorthand notation pi := (v»,Vo) 
and pij := (vj,Vj). As the v's are unit vectors the inequalities (2) are equivalent to the following 
inequalities for every i,j € [n] 

Pi + Pj + Pij > -1 Pi - Pj - Pij > -1 

—m + pj - p^ > -i -pi - pj + pij > — i- 

This motivates the following definition. 

Definition 2.1 (Configuration). We denote by Conf C [—1, l] 3 the polytope defined by (3) together 
with the constraints pi,pj,pij £ [—1,1]. A tuple (pi,p2,p) £ Conf is called a configuration. 

Typical rounding schemes round the vectors Vj by considering their projections on a random 
vector. However, while this produces a cut that is balanced in expectation, it might not be close 
to balanced with high probability as vertices might be correlated. One of the main ideas in [RT12] 
is the notion of vectors with low global correlation. There are many possibilities for such a notion; 
[RT12] introduce a notion called a-independence. For our algorithm, we need the following stronger 
definition. 

Definition 2.2 (e-uncorrelated SDP solution). Let vo,...,v n be a vector solution. Write w, = 
v i — ( v 0) v i) v o f or the part o/vj that is orthogonal to vo, and Wj = Wj/||wj||. Then, vq, . . . , v n is 
e-uncorrelated if 

. E T ,[|( W i' W i)l] < e - 

For the interested reader that is familiar with the probabilistic interpretations of the Lasserre 
hierarchy the quantity (Wj, Wj) precisely equals the correlation coefficient between the variables Xi 
and Xj. In comparison, a-independence used in [RT12] is defined in terms of the mutual information 
of the same variables, which is within a quadratic factor of their covariance, (wj,w,-). 

2.2 Normal Distributions 

Throughout the paper, we write 4>{x) = -^=e~ x2 ^ 2 for the density function of a standard normal 

random variable, $(x) = Jy = _ OQ (j)(y)dy for its CDF, and < I ) ~ 1 : [0, 1] — > [—00, 00] for the inverse of 
We also make use of the following standard fact about projections of Gaussians onto vectors. 
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Fact 2.3. Let U\,...,ut G M", g an n- dimensional standard Gaussian vector, and Z{ = (iij,g). 
Then z\,. . . ,zt are jointly Gaussian random variables with expectation and covariances Cov[zj, Zj] - 

(Uj,Uf). 

We also need notation for the CDF of the bivariate normal distribution. 
Definition 2.4. Let p G [-1, 1]. We define T- p : [0, l] 2 -> [0, 1] by 

Fp(x,y) = Pr [X < q>-\x) A Y < ^{y)] , 



where X and Y are jointly normal random variables with mean and covariance matrix 



1 P 
P 1 



The following parametrization of the V function is convenient when analyzing our algorithms. 
The motivation will become clear in Section 3.2. 

Definition 2.5. For p G [—1,1], recall the definition of Tp : [0, l] 2 — > [—1,1] and define Ap : 
[-1,1] 2 ^[-1,1] as 

We now state three lemmata about Tp that turn out to be useful for us. For completeness, 
proofs can be found in Appendix A. 

Lemma 2.6. For every p £ [—1, 1], qi,q2 G [0, 1], we have 

Fp(l - gi, 1 - q 2 ) = Tp(qi,q 2 ) + 1 - q± - q 2 - 
Lemma 2.7. For every p G [—1, 1], q\,q 2 G [0, 1], we have 

Tp{qi,q2) < <7i<?2 + 2\p\. 
Lemma 2.8. For every p G (—1, 1), qx, q 2 G [0, 1], we have 

-Fp{qi,Q2) = $ 



where ti = $ (q-i 



3 A Family Of Bisection Algorithms 

In this section we describe a general family of rounding algorithms for Max Bisection. We first 
describe the following lemma that we need for our algorithm. 

Lemma 3.1. There is an algorithm which, given an integer t and a graph G = (V,E), runs in 
time n°w and outputs a set of unit vectors Vo, . . . , v n such that 

1. SDPVal({ Vi }) > Opt(G) - lot" 1 / 12 , 

*■ E i (vo,v,) = 0, 



3. The triangle inequalities (3) are satisfied, 



4- The vectors Vq, Vi, . . . , v n are t 1 ' /4 -uncorrelated. 



Lemma 3.1, which we prove in Section 3.4 below, is analogous to Theorem 4.6 in the full version 
of [RT12]. The main difference is in item 4. As mentioned in Section 2.1, [RT12] uses the notion 
of a-independence which bounds the average mutual information in an average pair of variables 
i,j, corresponding (up to a quadratic factor) to bounding the average covariance between a pair 
of variables, whereas the notion of e-uncorrelation (Definition 2.2) bounds the average correlation 
coefficient. We need this stronger property of the vectors because we use a more general family of 
rounding functions than [RT12]. 

The Max Bisection algorithm is presented in Algorithm 1. It uses a random hyperplane 
rounding that is parameterized by a second algorithm, which we refer to as a bias selection algorithm. 

Algorithm 1 Max Bisection algorithm 

Input: Graph G = (V,E), parameter e > 0, bias selection algorithm SelectBias 
Output: Assignment y £ { — 1, 1}" satisfying ^ yi = 
1: Run the procedure of Lemma 3.1 with t = (20/e) 12 to get vectors Vo, . . . , v n 

2: m <- (v , Vj), Wi <r- Vj - HiV , 



W; 



2 



if 1 1 Wi 1 1 2 / 0, 



a unit vector orthogonal to all other vectors if 1 1 1 1 2 = 
3: (ri, . . . , r n ) <- SelectBias(^i, . . . , fj, n ) 
4: g 4— standard re-dimensional Gaussian vector 
-1 if (w i>B ) <$- 1 ( 1 i a ) 



1 otherwise 



6: b <— i Yliev x i (* ne imbalance of x) 

7: S <— a uniformly random set of |6| vertices i s.t. Xi = sign(fc) 
Xi if i $ S 
-Xi if i G S 



8: Vi 



return y 1 ,...,y r , 



To understand the bias selection algorithm, first note that by Fact 2.3 the value (Wj,g), used 
to determine the value of Xi in step 5, is a standard Gaussian random variable. It then follows that 
E[xj] = ri, i.e., is precisely the bias of Xi produced by the rounding algorithm. 

Thus, in order for the intermediate cut x to be balanced in expectation, we require that the 
output of the bias selection algorithm satisfies ^ r, = 0. This could be relaxed to only requiring 
that I Yl r i\ — en ' but we do not need this relaxed notion. The bias selection algorithm can be 
randomized, in which case, we would require that Pr[]P rj = 0] ~ 1. In principle, the bias selection 
algorithm is allowed to look at the SDP solution vo,.. - ,v n as well as G, but our bias selection 
algorithm only uses fj,\, . . . , \i n . Notice that item 2 of Lemma 3.1 implies that Yli (H = 0- 

Varying the bias selection algorithm gives rise to different rounding algorithms, and the question 
is how to efficiently find r^'s that give a good approximation ratio. The Raghavendra-Tan Algo- 
rithm, achieving an approximation ratio of 0.85, can be expressed in this framework as choosing 
rj = /ij. In general, it would be natural to let ri depend solely on /ij, i.e., := f(fJ*i) for some 
function / : [— 1, 1] — > [—1, 1]. However, because of the balance requirement = 0, the function 
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/ would need to be linear, resulting in a quite limited family of roundings. Nevertheless, as we 
shall see in Section 4, this kind of rounding is sufficient to obtain a 0.8736-approximation. 

In order to improve upon this, the choice of rj needs to look at more than just p^ In Section 5, 
we devise a 0.8776-algorithm. There the bias selection algorithm starts by setting rj = c-pi but then 
adjusts some of the rj values in a controlled way so as to preserve Yl r % = 0- Somewhat curiously, 
an effect of our rounding scheme is that two vertices i and j of the graph with the same vectors 
Vj = Vj can be rounded differently by the algorithm. We are not aware of previous algorithms 
where this happens. 

3.1 Overview of Analysis 

To analyze the algorithm, first notice that from Lemma 3.1 SDPVal({vj}) > Opt(G) — e/2. Thus, 
it suffices to lower bound the value of the resulting bisection y in terms of SDPVal({vj}). 

Now consider the intermediate cut x of Algorithm 1. When constructing x, the behaviour of 
the algorithm on an edge £ E depends solely on the pairwise inner products p,i, pj, p^ and 
the two biases rj and rj. Notice that by Lemma 3.1 (/ij, p,j, pij) is a configuration as defined in 
Definition 2.1. We would then like to compute the "relative contribution" a : Conf x [—1, l] 2 — > IR>o 
defined such that a(pi, pj, ptj,ri,Tj) is the contribution of the edge (i, j) to the value of the rounded 
solution divided by its contribution to the value of the SDP solution vo, . . . , v n . In other words we 
define a, somewhat informally, as 

/ \ Pr [xj 7= Xj pi, pj, pij, Ti, rj] 
«(/*, p 3 , Pl3 ,n,rj) = { i- PlJ )/2 ' 

A formal definition appears in Section 3.2, Definition 3.5. Given this definition, the following lemma, 
which lower bounds the value of the cut x, is intuitively obvious. We prove it in Section 3.2. 

Lemma 3.2. Suppose that for every edge S E, it holds that a{pi, pj, pij,ri,rj) > a, for some 
a > 0. Then the assignment x produced by Algorithm 1 satisfies 

E[Val(x)] > aSDPVal({v 4 }). 

Finally, we need to show that the balancing step at the end of the algorithm only incurs a small 
loss. The following lemma, proved in Section 3.3, establishes this. The main idea is to show that 
most of the time the solution x is not too unbalanced to begin with. 

Lemma 3.3. Consider Algorithm 1 and suppose the biases selected in step 3 satisfy J^rj = 0. 
Then, it holds that 

E[Val(y)] > E[Val(x)] - e/2. 

Taken together, Lemmata 3.1, 3.2 and 3.3 imply that Algorithm 1 is an (a — e)-approximation 
algorithm for Max Bisection so the main crux is understanding the function a : Conf x [— 1, l] 2 — > 
M>o- We note that the running time of the algorithm is n°( 1//<E \ 

3.2 Analysis of Approximation Ratio 

In this section we elaborate further on the definition of the function a, and Lemma 3.2. First we 
express the probability that two vertices are on the same side of the cut. Recall Definition 2.5 of 
the function Ap. 
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Lemma 3.4. Consider Algorithm 1 and any pair of variables Xi,Xj. Denote p = (w.;,Wj). Then, 

Pr[xj = xj] = Ap(n,rj). 

Proof. By Fact 2.3, (wj, g) and (Wj, g) are jointly normal with covariance p and variance 1. Thus, 



Pr[xj = —1 A Xj = — 1] = Tp 



Pr[xj = 1 A Xj = 1] = Pr 
= Pr 



1 — r j 1 — r j 



(w,. g) > <]> 1 ( A (w_j, g) > 5T 1 



(w i9 g) < -$ 1 
1 + n l + r 



1 — n\ , . (\ — r n 

A <Wj,g) < ( 



where the middle step used that g and — g have the same distribution and the last step used 
<&(— x) = 1 — <£(x). Using Lemma 2.6 we get 



Pr[xi = Xj] = Pv[xi = -1 A Xj = -1] + Pr[xj = 1 A xj = 1] = 2r,= 



i - n i-rj 



2 2 

We are now ready to give a definition of the function a described above. 



+ + 



□ 



Definition 3.5. For a configuration (p±, p) G Conf ; and for r\, r2 G [—1, 1], let p = ^j^ 2 ^ 

(if pi = ±1 or p2 = ±1 we Zei p = 0), and define 



a(pi,P2,P,n,r 2 ) 



2(1-As(n,r 2 )) 



From the discussion above, we can immediately deduce Lemma 3.2. 
Proof of Lemma 3. 2. When we run Algorithm 1 , the probability we cut an edge ij is 



Pr[xj / Xj] = 1 — Pr[xj = Xj 



(1 - pij) a(pi,pj,pij,ri,rj) (1 - pjj)a 



We remind the reader that ^ ^ is exactly the contribution of the pair ij to SDPVal({vj}), hence, 

E[Val(x)] = Pr fo + x j] > aSDPVal({ Vi }). □ 

ij£E 

3.3 Analysis of Balance 

In this section we prove Lemma 3.3. The lemma follows immediately from the following lemma. 

Lemma 3.6. Consider Algorithm 1 and assume that the biases selected in step 3 satisfy = 0. 
Then the assignment x chosen in step 5 satisfies 



Pr 
g 



X i 



> en/10 



< e/10. 
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This lemma is analogous to (but does not follow from) Theorem 5.6 and Corollary 5.7 in the full 
version of [RT12]. The main difference is that our lemma applies to any choice of biases, whereas 
Theorem 5.6 in [RT12] requires = /ij. This is enabled by our stronger notion of an uncorrelated 
SDP solution, i.e., Lemma 3.1. The proof of Theorem 5.6 in [RT12] is an elegant use of information- 
theorectic techniques ultimately relying on the so-called data processing inequality. While one can 
easily extend that proof to our setting, we give a different proof which is somewhat longer, but in 
our opinion more transparent and resulting in somewhat better bounds as we do not need to pass 
back and forth between covariance and mutual information. 

Proof of Lemma 3.6. Define the random variable X = - rcj. We have K[X] = ^ ^ vi = 0. We 
now bound VarLY] = Ejj e y [Cov[xj, Xj]] from which the desired bound follows using Chebyshev's 
inequality. Let r = 1/i 1 / 4 and recall that by Lemma 3.1 we have 

. E [|(Wj, Wj)|] < T. 

Fix some pair i,j and let p = (Wj, Wj). By Lemma 3.4 we have 

Cov[xi,Xj] = E[xiXj]-E[xi}E[xj] = 2Pr[xj = x^-l-r^j = 2A /5 (r i , r^-l-r^j = 4Tp (qi,qj)-^qiqj, 

where qi = ^-p- = Pr[xj = —1]. By Lemma 2.7, it follows that Cov[xi,Xj] < 8|p|. Averaging over 
all pairs i, j we get 

VarpT] = E [Cov[x i} xj]] < E [8|(w i; Wj)\] < 8r. 
i,jev «,jev 

Denoting by a = WVax [X] it follows from Chebyshev's inequality that 



Pr 



\X\ > a 2 / 3 



< a 



2/3 



Plugging in our bound a 2 < 8r = 8t 1//4 we have a 1 ^ < It 1 ^ 12 = e/10, completing the proof. □ 

3.4 Finding the Uncorrelated SDP Solution 

In this section we prove Lemma 3.1 which is restated below for convenience. 

Lemma 3.1 (restated). There is an algorithm which, given an integer t and a graph G = (V,E), 
runs in time nP^' and outputs a set of vectors Vo, . . . , v n such that 

1. SDPVal({ Vi }) > Opt(G) - lot" 1 / 12 , 

*■ E l (vo,v,) = 0, 

3. The triangle inequalities (3) are satisfied 

4- The vectors Vi, . . . , v n are i -1 / 4 -uncorrelated with respect to vq. 
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Proof. Without loss of generality assume that t is a sufficiently large constant and construct ran- 
dom vectors Vq, v^, . . . , v' n by applying Lemma 4.5 from the full version of [RT12]. The expected 
objective value of v^'s (where the expectation is over the randomness in their Lemma 4.5) equals 
SDP(+2(G) > Opt(G). Furthermore, the same lemma guarantees that the average mutual informa- 
tion of certain random variables (associated with the vectors), indicated by I(Xi,Xj), is low. We 
will not define mutual information and refer the interested reader to [RT12] as we can immediately 
apply Lemma 5.2 from that paper which relates I(Xi,Xj) to |(w£, w^)| to arrive at the following 
conclusion. 



E 



jyi<w>;>i] 



< E 


E 




}JeV 



32I{X l ,X j ) 



< 



'32 E 

fv'} 



.E [I(Xi,Xj)] 

i,j€V 



< 



32 6 

f < Vt 1 



where w^'s are defined from v' analogous to how Wj's are defined from Vj's. Furthermore the v-'s are 
a solution to level-2 Lasserre relaxation of Max Bisection and in particular satisfy X^( v 0' v i) = ^ 
along with all triangle inequalities. Applying two Markov bounds we conclude, 



Pr SDPVal({v-}) < Opt(G) - 9^ 1/12 < 1 - 9r 1/u , Pr 

{v^}L J {v^} 



< Qt- 1 ' 12 . 



Thus, by resampling Vq, v' l5 . . . , an (expected) M 1 / 12 times we obtain an SDP solution where all 
the following three conditions as well as the triangle inequalities on v^'s hold: 



SDPVal({v^}) > Opt(G) -9fr 



-1/12 



Ej\( W >, W >)\]<t- 



-5/12 



0. 



The above vectors have all the required conditions of the Lemma except the last. In particular, 
we have a bound on the average of the inner products (w^,w^) as opposed to the stronger bound 
on the inner products (wj,Wj). But the only way in which the stronger bound can fail to hold is 
if many ||w-||'s are small, i.e., if many of the values are close to 1. However, such vectors can 
be corrected for a small additional loss in SDP value. 

In particular, define vectors Vq, . . . , v n as vo = Vq and 

_ {< if Kll >t- 1/12 

I v' — w ■ + w? otherwise 



for i > 1. Here w* is a new vector orthogonal to all other vectors and of length ||w* 
Notice that now, 



| <W^, Wj) I 





IK,w^.)| 

II w' II ||w' I 
II I II II ,11 



if min(||w-||, ||w^-| 
otherwise, 



which in particular is bounded by |(w^, w'j)\/t 2 > 12 and therefore 



< rVi2 



E I 

i,j£V L 



(Wj, W 



J I 



< t -5/12/ t -2/12 =t -l/4 



Furthermore we can bound the difference in any inner product by 



V,-, v., 



( v ^ v 'i)l = l(wi,w,-) - (w-,w'-)| < f 



-1/12 
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0.2 0.4 0.6 0.8 1 0.83 0.84 0.85 0.86 0.87 0.88 

(a) Range < c < 1 (b) Range 0.83 < c < 0.88 

Figure 1: Plot of a(c) for entire range and zoomed in around optimal c. 

and so SDPVal({ Vi }) > SDPVal({v^}) -r 1 / 12 . Clearly, the condition £(v , v») = is still satisfied 
since all projections on vo remain the same. It remains to check the triangle inequalities. Consider 
any pair Vj,Vj such that one of them was changed. We then have pij = (vj, Vj) = fJ*ifJ,j, and the 
four inequalities (3) are equivalent to 

(l±Mi)(l±A*i) >0 

which clearly hold. □ 



4 Linear Biases: A 0.8736- Approximation 

In this section we study how far one can get by considering bias selection algorithms that set r, to 
be a linear function of /Xj. Recall that the Raghavendra-Tan algorithm, which uses = /ij, falls 
into this category. 

Definition 4.1. For c E [0, 1], define 

a(c) := min a(ni,fi2, P, c ■ /ii, c • /i 2 ). 
(/ii,/^2,p)eConf 

The following Lemma is an immediate corollary of the analysis in Section 3.1. 

Lemma 4.2. For any < c < 1, Algorithm 1 with the bias selection algorithm that sets ri = c • pi 
has approximation ratio at least a(c) — e. 

Claim (Numerical) 4.3. max cg [ 0j i] a(c) > 0.87368287, and it is achieved for c « 0.86450318. 
Figure 1 shows plots of a(c) for c G [0, 1] and c 6 [0.83, 0.88]. 

Just like with our 0.8776-algorithm (to be presented in the next section), we can obtain a 
rigorous proof of a slightly weaker version of Claim 4.3. In particular we prove that for c = 0.86451 
(a slight modification of) Algorithm 1 gives a 0.8736-approximation. This is done in Theorem 7.2. 
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Let us now spend some time discussing the worst case configurations for a(c), as our understand- 
ing of these will guide our choices when obtaining further improvements. Denote by c* ~ 0.86450318 
the optimal value of c. It turns out that (up to symmetry 1 ) there are two distinct worst case con- 
figurations </>i,02 for a(c*), approximately 

01 = (0.176945, 0.176945, -0.646110) 2 = (1, -1, -1). 

The presence of the integral configuration (1,-1,-1) may seem surprising at first, but has a very 
natural explanation. For this configuration, we have p = 0, meaning that the two vertices are 
rounded completely independently, one with expectation c and the other with expectation — c. 
Thus the probability that such an edge is cut by the algorithm is precisely ~t , and since the SDP 
value for this configuration is 1 this implies an upper bound of a(c) < ^4^-, meaning that c needs 
to be sufficiently large in order for us to obtain a good approximation ratio. Indeed, the c < c* 
part of Figure 1 follows this curve. 

The other worst case configuration is more interesting, and is quite similar to the kind of config- 
uration that is the worst for the Goemans- Williamson Max Cut algorithm. On this configuration, 
the approximation ratio improves as c decreases. Intuitively, this is because the configuration has 
both vertices biased in the same direction, so putting less importance on the bias results in a greater 
probability that the edge is cut. The optimal choice c* is the point where the ratio on 0i meets 
the curve -4r-. 



4.1 Limitations 

Even though maxa(c) « 0.8736, it is possible that a better ratio could be obtained by choosing c 
adaptively after seeing the graph G and SDP solution vo, • • • , v n . To rule out the possibility of any 
significant improvement of this form, we exhibit a distribution -Dconf over configurations (p\, p2, p) 
such that 

max E ^.P)^c mf [l-Ap(c-m,c^ 2 )] < Q 

^[o.i] %™)~D Conf [(l-/>)/2] 
The distribution is quite simple, and is only supported on the two worst-case configurations for 
a(c*). Specifically, (pi,[i2,p) ~ -Dconf is chosen as 

01 with probability 0.931935, 

02 otherwise. 

In Figure 2 we plot the approximation ratio on 0i and 02 as a function of c, as well as the ratio 
of (4) as a function of c. While the latter curve might appear to be a constant, it does have small 
variations of order 10 . 



5 Pairing Vertices: A 0.8776-approximation 

In this section we describe a bias selection algorithm which yields a 0.8776-approximation for Max 
Bisection. Let us start with an informal description of how to obtain the improvement. Recall 
from the discussion on the algorithm in Section 4 that an obstacle to further improvements was 
the "conflict" between the two critical configurations 0i which resembled a critical configuration 

1 Due to symmetry, the configuration {—fii,—fj,2,p) is completely equivalent to (fii, fj,2, p). 
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Figure 2: Approximation ratio on the two configurations and their mixture as a function of c 

for Max Cut and 4>2 which was just an integral configuration. Arguably, configurations like (pi in 
some sense capture the difficulty of Max Bisection, whereas the integral configuration $2 should 
be easy. With this in mind, it is natural to decrease the value of c to perform better on c/>i and 
similar configuration, and then do some adjustments on vertices with large in order to perform 
well on (f>2 and more generally on near-integral configurations. 

A first idea is the following: as long as there are edges (i, j) which are near-integral in the SDP 
solution, say, fii > 1 — 6 and < — 1 + 5 for some small constant 5, set r^ = 1 — 5 and rj = —1 + 5, 
respectively. Once all such edges are processed, use r j = c • [i% for all other vertices. Once one takes 
care of some technical details this idea can be made to work, however the improvement over the 
algorithm of the previous section is minor, of order, say, 10 -4 . 

In order to get a more impressive improvement, we use a "smooth" version of the above idea. 
As in the linear bias selection, we start by assigning rj = c ■ m for all i. We then pick off pairs of 
vertices such that \ii > and fj,j < are as large as possible (in absolute value). We then 
add some value Ar > to rj and subtract Ar from rj. Clearly, this operation preserves = 0- 
The remaining choice is now how to choose the "boost" Ar. It is somewhat natural to restrict 
ourselves to choosing Ar := (1 — c)/(min(|/ij|, \fij\)) where / : [0,1] — > [0,1] is a non-decreasing 
function which for technical reasons we require to be Lipshitz continuous and satisfy /(0) = 0. We 
refer to any such / as a "boost function". Notice that before the boosting all biases are in the 
interval [— c, c] so after the boosting all biases are in [—1, 1], i.e., valid. Ultimately we choose / to 
be piece-wise linear though it is quite possible that further improvements are possible with more 
complicated choices of /. More formally, our bias values are given by Algorithm 2. 

Let us now analyze the performance of the algorithm. As opposed to the linear bias selection 
algorithm used in the previous section, given some configuration Qui, ^2, p) we do not know exactly 
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Ratio on (/)% 
Ratio on 4>2 
Ratio on mixture 




Algorithm 2 Max Bisection bias selection 

Input: fj,i, . . . , fj, n G [—1, 1], parameter c G [0, 1], boost function / : [0, 1] — > [0, 1] 
Output: Biases n, . . . , r n G [—1, 1] such that ^ = 0. 



1 
1 


n <- 


c ■ in for i G [re] . 


z 




V 




while maxj 6 s ^ > A l 


4 


i i 


- argmax i6S fa 


5 


j < 


- argmin jeS fij 


6 


P 


<- min(|/^|, |) 


7 


n 


<- n + (l - c)/(/3) 


8 


r i 


<- rj - (1 - c)f(J3) 


9 


s 


^S\{i,j} 


10 


end while 


11 


return r%, . . . ,r n 



what r-values were used to round it. However, we do have the following Lemma, which provides 
bounds on these r-values. 

Lemma 5.1. For any vertex i, the value ri produced by Algorithm 2 satisfies 

sgn(rj) = sgn(fa) c\fa\ < \n\ < c\fa\ + (1 - c)f(\fa\) < 1. (5) 

Furthermore, for any vertex j such that sgn(fa) ^ sgn(//j) one of the following two hold, 

\rj\ > c\nj\ + (1 - c)/(mim>i|, |^|)), or, 
\n\ > c\m\ + (1 - c)f(mm(\m\, 

In other words, for a pair of vertices whose /i-values are of opposite sign, at least one of them 
picks up a "boost" which is as large as the boost of the smaller of the two. 

Proof of Lemma 5.1. The first part, (5), is straightforward: the value of ri is initialized to cfa 
which clearly satisfies (5). After this, it is changed at most once, in which case it has the value 
(1 — c)/(/3) added or subtracted to it depending on the sign of fa, and by monotonicity of / and 
the fact that /3 = min(|/Xj|, \fj,j>\) for some j' we have /(/?) < f(\fa\). 

For the second part, (6), notice that at least one of i and j has to be selected in the loop of the 
algorithm. We consider two cases, depending on which was selected first. Suppose j was selected 
before or in the same iteration as i. It was then selected together with some vertex i' £ V such 
that sgn(/v) = — sgn(/ij) = sgn(fa) and \fa>\ > \fa\. Thus the boost given to j was at least 

(1 - e)/(mm(|/^|, \fa,\)) > (1 - c)f{mm(\Lij\, \fa\)), 

where we have used monotonicity of /. 

The other case, when vertex i is selected before vertex j, is completely symmetric. □ 

Definition 5.2. Given fa,Li2 G [— 1, 1] the permissible biases R c j(fa, LI2) Q [—1, 1] x [—1, 1] of the 
pair are all values of r\,ri satisfying (5) i/sgn(/ii) = sgn(/i2) and (5-6) z/sgn(/ii) / sgn(^2)- 

Notice that the permissible biases of a pair {ll\,ll2) depends on the parameters c G [0,1] and 
f : [0, 1] — > [0, 1], a monotone function satisfying /(0) = ; of Algorithm 2 hence the notation R c /. 
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Note that when sgn(ui) = sgn(// 2 ), R c j(pi, u 2 ) is of the form I\ x I 2 where Ji (resp. I 2 ) is the 
interval of r-values with the same sign as «i (resp. 02) satisfying (5). When sgn(/xi) 7^ sgn(// 2 ) 
then R c j(ij,i, H2) can be similarly written as a union Ji x ^ U JJ x Ij, corresponding to which of 
the two variables n and r<i is subject to the stronger bound of (6). 

Now, we can lower bound the approximation ratio of the resulting algorithm by computing the 
minimum of a(oi, 02, p, T\, r 2 ) over all permissible biases. This motivates the following definitions. 

Definition 5.3. For c E [0, 1] and a boost function f : [0, 1] —> [0, 1], define 

a c j(pi,P2,p) = min a(m, p, 2 , P, n, r 2 ), 

ri,r2&R c ,f(^i,^2) 



and let 



q(c,/)= min a cJ (pi, p 2 , p)- 

(/ii,^2,p)6Conf 



An immediate corollary of Lemma 5.1 and Lemma 3.2 is that, for a fixed value of c and /, 
the approximation ratio when using Algorithm 2 to select biases is at least a(c, /). Thus, for a 
given c and / the approximation ratio of the algorithm can be computed as a five-dimensional 
optimization problem. For a general / however, the domain of feasible points may not even be 
convex. While it turns out that the problem is convex for the / that we ultimately use, we show 
that the optimization over r\ and T2 can be eliminated so that we are again left with a minimization 
problem over the space of all configurations. This significantly simplifies the computations needed 
to evaluate a(c, /) and makes our computer-assisted case analysis feasible. 

As R c j(pi, P2) is either of the form 1% x I2 or Ii x J 2 U l[ x I' 2 for some intervals 1%, I2, l[, I' 2 Q 
[—1,1], we make the following definition. 

Definition 5.4. For a configuration /xi,/i 2 , p and two closed intervals I\ = [ai,bi],l2 = [o2,6 2 ] Q 
[—1,1], define 

a(n\,H2,p,h,h) = mma(p,i,jj,2,p,ri,r 2 ) (7) 

Minimizing a over (n, r 2 ) £ R c j(pi, P2) boils down to at most two computations of a(pi, ^ 2 , p, Ii, 
We have the following theorem, which narrows the search over r%, r2 down to at most nine different 
possibilities. The proof is rather technical and is left for the end of the current section. 

Lemma 5.5. For every configuration pi,p2,P and closed intervals I\ = [ai,6i],/2 = [ a 2 ; 62], we 
have that 

u(nx,H2,p,I\,h) = min a(pi, 02, p, n, r 2 ), 
(r-i,r-2)es , ni"ix7 2 

where S is defined as follows. Recall that p = P ~^ 1M2 . 

V(i-m?)(i-m!) 

• If P < then S is the extreme points of the set l\ x 1%, i.e. 

S = {(ai,a 2 ),(ai,6 2 ),(6i,o 2 ),(6i,6 2 )}. 

• If p > then S is the extreme points plus five extra points defined in terms of the function 
g(x) = 1-2$ (<&~ l (^) /p). More precisely, 

S = {(0,0), (01,02), (01,62), (^1,02), (61,62), (oi,o(ai)), (61,5(61)), (5(02), a 2 ), (g(b 2 ),b 2 )}. 
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Figure 3: The two functions cp and cp + (1 — c)f(p). 



In other words, if the optimizer for (7) is not (0,0), one of the r- values is always at an extreme 
point of its domain, and the other is either at an extreme point, or at a point directly computable 
from the first value. In fact, since the permissible intervals R c /(/ii, P2) only intersect the origin 
(0, 0) at their endpoints, the possibility (0, 0) can be discarded when computing a c j(pi, p2)- Thus, 
the value of a c j(pi, P2, p) can be computed by evaluating a(pi, P2, p, n> 7*2) on at most 16 possible 
bias pairs (7*1, r2). 

Given the above lemma we use a numerical optimizer to compute the value a(c, /) for a par- 
ticular choice of the parameters c and /. The result is the following claim. 

Claim (Numerical) 5.6. Fore = 0.8056 andf{x) = 1.618 max(0, x-0. 478), a(c, f) > 0.87765366. 

For our formal proof that we can achieve approximation ratio at least 0.8776, we need to modify 
Algorithm 1 slightly to exlude certain types of configurations that are challenging for our prover 
program. In particular, we are only able to prove a good approximation ratio for configurations in 
which all |//j|'s and |/?r/|'s are bounded away from 1, so we modify Algorithm 1 to perform a simple 
preprocessing step on the vectors first to make sure that they are not too close to being integral. 
The details of this appears in Section 7 with the 0.8776-algorithm being given by Theorem 7.3. 
The choice of c and / used in our formal proof is the same as in Claim 5.6. 

Figure 3 shows the graphs of the two functions p 1— > cp and p 1— > cp + (1 — c)f(p), corresponding 
to the typical lower and upper bound for the bias r as a function of p, for the values used in 
Claim 5.6. 

When attempting to improve the approximation ratio, it turns out that there are now several 
different forms of critical or near-critical configurations, each of which imposes some restrictions 
on the behaviour of c and /. Moreover, as is common for this type of algorithm, our computations 
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indicate that the worst configurations p±, p2, p lie at the surface of the space of configurations Conf. 
In Figure 4, we give contour plots of a c ,f(^i, P2, p) along this surface. 

We now come back to the proof of Lemma 5.5 restated here for convenience. 

Lemma 5.5 (restated). For every configuration pi,p>2,P and closed intervals I\ = [a\,bi],l2 = 
[02,62], we have that 

a(pi,p,2,p,h,h)= min a(pi, p 2 , p, n, r 2 ), 

(ri,r 2 )€SnI 1 xI 2 

where S is defined as follows. Recall that p = 

• If P < then S is the extreme points of the set I\ x I2, i-e. 

S = {(01,02), (ai,b 2 ), (&i,o 2 ), (h,b 2 )} . 

• If p > then S is the extreme points plus five extra points defined in terms of the function 
g(x) = 1-2$ (^P) /p). More precisely, 

S = {(0,0), (01,02), (01,62), (61,02), (61,62), (01,5(01)), (61, 0(61)), (o(o 2 ),a 2 ), (5(62), 6 2 )}. 



Proof. We consider several cases depending on the value of p. 

Case 1: \p\ < 1. Using Lemma 2.8 and the definition of Ap we have that 

9 a / n 1 * I ^2) - pt(n)\ 
-Ap(n,r 2 ) = - ' 1 



-,2 



where we write t(r) = 1 (— jp). Thus, 

9 , , -2 M5, , 1 / f t(r 2 ) -pt(n)\ \ 

^ a(w , M ,p, ri ,r 2 ) = ^^(rur,) = — ^ j - 1 j . (8) 

Subcase 1.1: —1 < p < 0. Computing the second derivative of a with respect to n we have 
d 2 , s 2pt'(n) , /i(r 2 )-pi(n)\ 



a(p 1 ,p2,P,ri,r 2 



P ,( t{r 2 ) - pt(ri) 



^(ri))(l-p)v^V V V 7 ! 3 ^ 



< 0. 



where we have used that J^$ _1 (a;) = 1 / (f)(<&~ 1 (x)) . Thus a is concave in n in this subcase. By 
symmetry the same holds for r2 as well which implies that a(pi, /x 2 , p, Ii, I2) is minimized at one 
of the four extreme points as claimed. 
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Subcase 1.2: < p < 1. This case requires a little more work. Fix some minimum r*, r 2 of (7). 
If {r\,r 2 ) G {ai,6i} x {02,62} we are done, so we can assume that one of r* and r 2 lies strictly 
inside its interval. Suppose r* is in the interior of 1%, i.e., a\ < r* <bi. Then necessarily 

d 

— a(p 1 ,p 2 ,P,r 1 ,r 2 ) = 0. 
ar% 

By (8) this implies that 
which has the unique solution 

t{rl)-pt(rl) _ 

or equivalently, 

t(r* 2 )=pt(rl), (9) 

Similarly, if a 2 < r\ < 62, it must be the case that t(rl) = pt(r 2 ). 

This implies that if both r\ and r 2 lie strictly inside their respective intervals then t(r^) = 
pt( r 2) = P 2 t{ r i)- As \P\ < 1 this implies t(r\) = t(r 2 )* = which has the unique solution 
r\=r* 2 = 0. 

On the other hand if exactly one of r\ and r 2 lies strictly inside its respective interval, say r\, 
then by (9), r\ = r 1 (t(r* 2 ) / p) = g(r* 2 ). 



Case 2: p = 1. In this case, 

. , 2(1-A 1 (r 1 ,r 2 )) 2 (1 - Pr g ^ (0 ,i) [g [i(n),i(r 2 )]]) |n-r 2 | , in , 

a{pi,p,2,p,ri,r 2 ) = : = = — . r - 10 

1-p 1-p (1-P) 

The minimizer (r\,r 2 ) of this expression depends on whether I\ n I2 = or not. If I\ n ^2 =0 then 
the unique minimizer (r*, r^j) is in {oj, 61} x {02, 62}- Otherwise, if I\ n -Z2 7^ 0, then the minimum 
is zero and any r\ = r 2 = r* G I\ PI / 2 is a minimizer of (10). In particular we can choose r* to be 
the endpoint of one of the intervals. Noting that when p = 1 we have g{x) = x finishes this case. 

Case 3: p = — 1. Similarly to the previous case we now have 

2(1- A_i(n,r 2 )) _ 2 (1 - Pr g ^(o,i) [g € [-t(ri), t(r 2 )]]) _ 2-|n+r 2 | 



a(pi,p 2 ,P,n,r 2 ) 



1-p 1-p (1-p) 

The unique minimizer (r^r?!) of this expression is clearly in {ai,&i} x {02,62}- D 
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6 Max Bisect-2-Sat 



Algorithm 1 can be directly applied to any Max Bisect-CSP(P). In particular it is interesting 
to do this for Max Bisect-2-Sat. 

In the language of this paper, the best algorithm for Max 2-Sat [LLZ02] uses a linear bias 
selection algorithm r, = c • 6j (see description in [Aus07]) and so it already satisfies the constraint 
Y2 r i = 0- Thus it immediately extends to the case of Max Bisect-2-Sat, implying that this 
problem is approximable within ollz~ e for every e > 0, where oillz ~ 0.9401 is the approximation 
threshold for Max 2-Sat assuming the UGC. 

Furthermore, it is easy to see that for any predicate P, Max Bisect-CSP(P) is at least as hard 
as Max CSP(P); the reduction from Max CSP(P) to Max Bisect-CSP(P) simply produces 
two disjoint copies of the Max CSP(P) instance and negates all literals in one of the copies. 

In particular, this implies that assuming the UGC, the approximation threshold of Max 
Bisect-2-Sat is the same as the threshold for Max 2-Sat, namely ollz ~ 0.9401. This fact, that 
the balance constraint does not make Max 2-Sat harder, can be seen as circumstantial evidence 
that Max Bisection is as easy as Max Cut. 

7 Proofs of Approximation Ratios 

Unfortunately, our formal proofs of approximation ratios are based on case analysis of several 
million cases, and we therefore have to construct them with the assistance of a computer. The case 
analysis is similar to that of e.g., [Zwi02, Sj609] and proceeds by recursively dividing the search 
space [— 1, l] 3 into subcubes. When processing a cube C C [— 1, l] 3 , we can compute lower and 
upper bounds on the performance of our algorithm a({ii, P2-, p, r±, r2) for (pi, P2, p) £ C. To handle 
this and to also take care of the rounding errors inherent in finite precision calculations, we use 
interval arithmetic. 

When processing a cube C, there are four possibilies: 

1. C is completely outside the space of configurations Conf . 

2. The lower bound on a in the cube exceeds the approximation ratio we are trying to prove. 

3. The upper bound on a in the cube is lower than the approximation ratio we are trying to 
prove. 

4. None of the above: the case is inconclusive. Then we subdivide C into eight subcubes in the 
natural way, and we check each of them recursively. 

Note that we need to run the above test till we reach our precision threshold and no inconclusive 
cases remain. Also, this will translate into a proof for our approximation performance as long as we 
avoid case 3. Unfortunately, it turns out that there is one issue to deal with. Specifically, consider 
a configuration (pi,p2,p) where p\ « ±1, or more precisely a cube C such that all configurations 
in C have u\ « ±1. Then the dependence of p = p ~^ lfl ? == on p\ is not Lipschitz continuous 

meaning that even when the cube is small the uncertainty in p can be very large, which in turn 
results in poor bounds on the value of a and in particular our lower bound will not be strong enough 
to conclude that this case is not problematic. This turns out to be not just a hypothetical issue, 
but a very real one, because in our algorithm there are worst or near-worst configurations which 
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have pi (or p 2 ) close or equal to ±1. A similar issue occurs when p ~ 1, in which case the SDP 
value is close to and we need very sharp estimates on p in order to get a sufficiently strong lower 
bound on a. To overcome this, the simplest recourse is to slightly alter Algorithm 1 by adding a 
preprocessing step which precludes configurations (pi, p2, P12) where \pi\ or \p\ is close to 1. This 
causes us an additional small loss in the SDP objective value. We have the following theorem. 

Lemma 7.1. Given 5 > and an SDP solution vo,...,v n (unit vectors), we can construct in 
polynomial time a new SDP solution v' , ... , v' n (unit vectors) such that 

1. SDPVal({v^}) > SDPVal({ Vi }) -5, 

I ( v i> v j) I < 1 - <5 for every < i < j < k, 

3. If {vj} satisfies the triangle inequalities than so does {v^}, 

4- If {vj} is e-uncorrelated for some e > then so is {v^}. 

Proof sketch. We replace every vector Vj, . . . , v n by = \/l — <5vj + \/5uj, where Uj is a unit vector 
orthogonal to all other vectors (we keep Vq = vo the same). This has the effect of scaling all pi's 
by 1 — 5 and all p^-'s by (1 — 5) 2 , i.e., 

/4 = C 1 - S )^i Pij = (1 - 

As a result, the four items can be proven through straightforward calculations. The last item may be 
easier to think about in the probabilistic view: in terms of the local distributions, the transformation 
we did has the effect of mixing the local distributions with the uniform distribution, which clearly 
only decreases correlations. □ 

With this lemma in place, it is natural to introduce a variation Conf^ of the space of configu- 
rations Conf C [— 1, l] 3 , where we exclude all configurations where some coordinate exceeds 1 — 5 
in absolute value, i.e., 

Conf s := Conf n [-1 + 6, 1 - 5] 3 . 

We refer to such configurations as smooth. We then extend the various a definitions which involve 
minimization over Conf in a similar way: analogously to Definition 4.1 we write 

a s (c) := min a(px,p, 2 ,p,c- pt,c- p 2 ), 

Oi,/X2,p)eConf,5 

and analogously to Definition 5.3 we write 

as (c, /) = / min a cJ (pi ,p 2 ,p)- 

(^l.M2,p)6Conf,5 

By Lemma 7.1, if a$(c, f) > a for some c, / and 5, using the framework of Section 3 we 
immediately obtain an (a — 5 — e)-approximation algorithm (for any e > 0, with running time 
0(n poly ( 1//e ))), and similarly for 05(c). 

By computer-assisted case analysis, we are able to prove the following two theorems, lower 
bounding the approximation ratios of our two types of rounding on smooth configurations. First, 
we are able to justify the performance of our first algorithm as presented in Section 4. 

Theorem 7.2. For c = 0.86451 and 5 = 1(T 5 ; we have a s (c) > 0.87362. 
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The proof of Theorem 7.2 consists of roughly 20 million cases and the theorem takes about 9 
minutes to prove on a SunFire X2270 machine with Intel X5675 CPUs. 

Most importantly, the next theorem implies our improved approximation guarantee, as described 
in Theorem 1.1. 

Theorem 7.3. For c = 0.8056, f(x) = 1.618 max(0, x - 0.478) and S = 10~ 5 , we have a s {c, f) > 
0.87762. 

The proof of Theorem 7.3 consists of roughly 140 million cases and the theorem takes about 25 
minutes to prove on a SunFire X2270 machine with Intel X5675 CPUs. 

8 Conclusion and Future Work 

We introduced a new class of rounding algorithms for the Max Bisection problem and Max 
Bisect-CSP extending the work of [RT12] . We analyzed the results to present a 0.8776-approximation 
algorithm for Max Bisection and an — e)-approximation algorithm for Max Bisect-2- 

Sat, improving on approximation ratios of 0.85 and 0.93 respectively [RT12]. Our improved bound 
0.8776 is so far based on extensive numerical evidence, but we are currently working on a formal 
proof of this bound. Our algorithm for Max Bisect- 2- Sat is optimal assuming the Unique Games 
Conjecture and the ratio of our algorithm for Max Bisection is off from the UGC-hardness thresh- 
old by less than 10~ 3 . The most obvious open question is to close this small gap. We conjecture 
that there is an (acw — e)-approximation algorithm for Max Bisection, i.e., it has the same 
approximation threshold as MAX Cut. 

It is worth noting that there are constraint satisfaction problems where adding a bisection 
constraint makes the problem strictly harder. A natural example is MlN Cut which is solvable 
in polynomial time but its bisection variant, MlN Bisection, is NP-hard to solve exactly and 
R3SAT-hard to approximate within a factor 4/3 [Fei02]. 

It would be interesting to come up with a generic algorithm family that provides the best ap- 
proximation algorithm for all Max Bisect-CSP(P) problems. In particular, while the seminal 
work of Raghavendra [Rag08] shows that, assuming the UGC, for any predicate P the best ap- 
proximation algorithm for Max CSP(i- > ) is to run a certain rounding scheme on its natural SDP 
relaxation there is no analog for Max Bisect-CSP(P). Notice that [Rag08] does not provide a 
(practical) way to compute the approximation factor of this algorithm and just proves its optimal- 
ity, hence a parallel result for Max Bisect-CSP(P) would be incomparable to the current paper 
and [RT12]. 
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A Proofs of Some Properties of the Bivariate Gaussian Distribu- 
tion 

In this section we prove some lemmata from Section 2.2 

Lemma 2.6 (restated). For every p G [—1, 1], qi,q2 G [0, 1], we have 

Fp(l - qi, 1 - q 2 ) = Tp(q 1 ,q 2 ) + 1 - qi - q 2 . 

Proof. Define (X, Y) as a pair of jointly Gaussian random variables each of which has mean and 
variance 1, where Cov(X, Y) = p. From definition of V we have, 

I>(1 - qi, 1 - q 2 ) = Pr[X < ^(l - Ql ) A Y < ^(l - q 2 )] 

= Pr[-X > - qi) A —Y > - q 2 )] . 

Observing that <3? -1 (l — qi) = —&(qi) and (—X, —Y) has exactly the same distribution as (X, Y), 

= Pr[A > A Y > ^\q 2 )] = 1 - Pr[A < V Y < ^\q 2 )] 

= 1 - (Px[X < ^-\qi)] +Pr[Y < ^ 1 (g 2 )] -Pi[X < AY < 

= 1 — qi — q 2 + Pt[X < $ _1 (gi) AY < ^fa)] = 1 - qi - q 2 + Tp(q u q 2 ), 

where we have used inclusion-exclusion and the fact that Pr \X = = in the last two lines 

of the proof. □ 
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Lemma 2.7 (restated). For any p G [—1, 1], qi,q 2 £ [0, 1], we /icrae 

Pp(qi,q 2 ) < qiq 2 + 2|p|. 

Proof. Let ii = <I>~ 1 (gi), t 2 = (g2) ■ We may assume \p\ < 1/2 since otherwise the Lemma is 
trivially true. For any x £ K, we have 



Pr[y < t 2 I X = x] = $ 



*2 — P~X 



V2vre 



< $(t 2 - p"x) + 



< $ (t 2 + ^J= + /V ^ 



<*(t 2 ) + ^+ ^ 



2vr V2vre 



(by Lemma A.l, (12)) 
(by Lemma A.l, (11)) 
(by \p\ < 1/2) 



From this we conclude that 



(x)<3? 



x=—oo 

tl 



t 2 — px 



dx 



< 



(x) ( *(« 2 ) + 4= + 



'2ire V2vr 



dx 



P 2 \p\ 



\x\4>(x)dx 



< qiQ2 + \p\ + |p|, 

where in the last step we have used E[|X|] = \/2/ir for X ~ A/"(0, 1). This completes the proof. □ 

The preceding proof used two standard anti-concentration bounds for Gaussians, as summarized 
by the following Lemma. 

Lemma A.l. Let X ~ A/"(0, 1) be a standard gaussian random variable, and i G R, a < b, a > 1 
real numbers. Then, 

Pi[a < X <b] < (b — a) /V2^, 

a — 1 



Pr[A < at] < Pr[A < t] + 



/ 2Tie 



(11) 
(12) 



Proof. To prove (11) observe that 
Pr[a < X < b] = 



e~ x2 / 2 dx < 



/27T Jx=a yl'K J x=a 

Proceeding to (12), the case t < holds trivially. For t > we have 



dx 



b — a 



Pi[t <X<at] 



27T J x 



at i 

e - 2 /2 dx < _L 

x=t V 27T 



ttf 



x=t 



2/ 2 ^ (a — l)te * 2 / 2 ^ a — 1 



< 



2tt 



^Tre 



where the last inequality follows because the derivative of the function f(t) = te * 2 / 2 is (1— t 2 )e * 2//2 , 
hence f(t) achieves its maximum at t = 1. □ 
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We now prove Lemma 2.8 repeated here for convenience. 
Lemma 2.8 (restated). For every p G (— 1, 1), gi, q2 € [0,1], we have 

-l>(<?i,g 2 ) = $ 



where ti = 
Proof. We have 

giving 



ti 



*2 — pa; 



where the second step used -g^3> _1 (:r) = ^pr^yy- D 



30 



